Ridge-SimSel: A generalization of the variable selection method SimSel to multicollinear data sets

نویسنده

  • Martin Eklund
چکیده

Variable selection is an important part of most applications of statistical modeling. The variable selection problem is in biological applications often demanding, due to that the true underlying function in general is nonlinear and that datasets often contain multicollinearities, measurement-error variables, and outliers. We here introduce Ridge-SimSel, a simulation based variable selection method that is quite insensitive with respect to these complexities. Ridge-SimSel is a generalization of the previously described SimSel method to handle the case of multicollinear datasets. It works by fitting an approximative model to the data, which is used to study how disturbances of the independent variables affect the fit of the approximative model. The main idea is that disturbing unimportant variables does not affect quality of the model fit.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

SimSel - a new simulation feature selection method I

In pharmaceutical research there are data sets describing the interactions between proteins and molecules. The data sets include a huge number of independent variables (features) and the response variable is typically the binding strength. Thus, one of the most challenging problems is to find the features that have a real influence on the binding strength. Here we present a feature selection me...

متن کامل

Application of Ridge Regression to Multicollinear Data

The main thrust of this paper is to investigate the ridge regression problem in multicollinear data. The properties of ridge estimator are discussed. Variance inflation factors, eigen values and standardization problem are studied through an empirical comparison between OLS and ridge regression method by regressing number of persons employed on five variables. Methods to choose biasing paramete...

متن کامل

Feature Selection for Ridge Regression with Provable Guarantees

We introduce single-set spectral sparsification as a deterministic sampling-based feature selection technique for regularized least-squares classification, which is the classification analog to ridge regression. The method is unsupervised and gives worst-case guarantees of the generalization power of the classification function after feature selection with respect to the classification function...

متن کامل

Selection of Variables that Influence Drug Injection in Prison: Comparison of Methods with Multiple Imputed Data Sets

Background: Prisoners, compared to the general population, are at greater risk of infection. Drug injection is the main route of HIV transmission, in particular in Iran. What would be of interest is to determine variables that govern drug injection among prisoners. However, one of the issues that challenge model building is incomplete national data sets. In this paper, we addressed the process ...

متن کامل

A Comparison between New Estimation and variable Selectiion method in Regression models by Using Simulation

In this paper some new methods whitch very recently have been introduced for parameter estimation and variable selection in regression models are reviewd. Furthermore , we simulate several models in order to evaluate the performance of these methods under diffrent situation. At last we compare the performance of these methods with that of the regular traditional variable selection methods such ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009